2024-09-19
ohio20 <- read_xlsx("c08/data/ohio_2020.xlsx") |>
mutate(behavior = Hmisc::cut2(rk_behavior, g = 4),
clin_care = Hmisc::cut2(rk_clin_care, g = 3)) |>
mutate(behavior = fct_recode(behavior,
"Best" = "[ 1,23)", "High" = "[23,45)",
"Low" = "[45,67)", "Worst" = "[67,88]")) |>
mutate(clin_care = fct_recode(clin_care,
"Strong" = "[ 1,31)", "Middle" = "[31,60)",
"Weak" = "[60,88]")) |>
select(FIPS, state, county, outcomes, behavior, clin_care,
everything())ohio20 (88 rows and 12 variables, 12 shown)
ID | Name | Type | Missings | Values | N
---+--------------+-------------+----------+-----------------+------------
1 | FIPS | character | 0 (0.0%) | 39001 | 1 ( 1.1%)
| | | | 39003 | 1 ( 1.1%)
| | | | 39005 | 1 ( 1.1%)
| | | | 39007 | 1 ( 1.1%)
| | | | 39009 | 1 ( 1.1%)
| | | | 39011 | 1 ( 1.1%)
| | | | 39013 | 1 ( 1.1%)
| | | | 39015 | 1 ( 1.1%)
| | | | 39017 | 1 ( 1.1%)
| | | | 39019 | 1 ( 1.1%)
| | | | (...) |
---+--------------+-------------+----------+-----------------+------------
2 | state | character | 0 (0.0%) | Ohio | 88 (100.0%)
---+--------------+-------------+----------+-----------------+------------
3 | county | character | 0 (0.0%) | Adams | 1 ( 1.1%)
| | | | Allen | 1 ( 1.1%)
| | | | Ashland | 1 ( 1.1%)
| | | | Ashtabula | 1 ( 1.1%)
| | | | Athens | 1 ( 1.1%)
| | | | Auglaize | 1 ( 1.1%)
| | | | Belmont | 1 ( 1.1%)
| | | | Brown | 1 ( 1.1%)
| | | | Butler | 1 ( 1.1%)
| | | | Carroll | 1 ( 1.1%)
| | | | (...) |
---+--------------+-------------+----------+-----------------+------------
4 | outcomes | numeric | 0 (0.0%) | [-2.05, 2.17] | 88
---+--------------+-------------+----------+-----------------+------------
5 | behavior | categorical | 0 (0.0%) | Best | 22 ( 25.0%)
| | | | High | 22 ( 25.0%)
| | | | Low | 22 ( 25.0%)
| | | | Worst | 22 ( 25.0%)
---+--------------+-------------+----------+-----------------+------------
6 | clin_care | categorical | 0 (0.0%) | Strong | 30 ( 34.1%)
| | | | Middle | 29 ( 33.0%)
| | | | Weak | 29 ( 33.0%)
---+--------------+-------------+----------+-----------------+------------
7 | rk_behavior | numeric | 0 (0.0%) | [1, 88] | 88
---+--------------+-------------+----------+-----------------+------------
8 | rk_clin_care | numeric | 0 (0.0%) | [1, 88] | 88
---+--------------+-------------+----------+-----------------+------------
9 | rural | numeric | 0 (0.0%) | [0.58, 100] | 88
---+--------------+-------------+----------+-----------------+------------
10 | income | numeric | 0 (0.0%) | [40416, 103536] | 88
---+--------------+-------------+----------+-----------------+------------
11 | trump16 | numeric | 0 (0.0%) | [0.31, 0.81] | 88
---+--------------+-------------+----------+-----------------+------------
12 | somecollege | numeric | 0 (0.0%) | [20.38, 85.07] | 88
--------------------------------------------------------------------------
ggplot(ohio20, aes(x = outcomes, y = behavior)) +
geom_violin() +
geom_boxplot(aes(fill = behavior), width = 0.2) +
stat_summary(fun = mean, geom = "point",
shape = 16, size = 3, col = "white") +
scale_fill_metro_d() +
guides(fill = "none") +
labs(x = "Health Outcomes", y = "Behavior Group",
title = "Health Outcomes by Behavior Group")Do we see meaningful differences in population means of outcomes across behavior groups?
Parameter | Coefficient | SE | 90% CI | t(84) | p
------------------------------------------------------------------------
(Intercept) | 0.96 | 0.11 | [ 0.78, 1.15] | 8.74 | < .001
behavior [High] | -0.71 | 0.16 | [-0.97, -0.45] | -4.55 | < .001
behavior [Low] | -1.14 | 0.16 | [-1.40, -0.88] | -7.32 | < .001
behavior [Worst] | -2.01 | 0.16 | [-2.27, -1.75] | -12.85 | < .001
Estimated Marginal Means
behavior | Mean | SE | 90% CI
----------------------------------------
Best | 0.96 | 0.11 | [ 0.78, 1.15]
High | 0.25 | 0.11 | [ 0.07, 0.44]
Low | -0.18 | 0.11 | [-0.36, 0.01]
Worst | -1.04 | 0.11 | [-1.22, -0.86]
Marginal means estimated at behavior
There are two (main) methods we’ll study to estimate the difference between specific pairs of means with uncertainty intervals, while dealing with the problem of multiple comparisons.
Just for fun, we’ll use a 99% confidence level today.
behavior group means?ANOVA suggests that the means aren’t likely to all be the same, but we really want to make pairwise comparisons…
Estimated Marginal Means
behavior | Mean | SE | 99% CI
----------------------------------------
Best | 0.96 | 0.11 | [ 0.67, 1.26]
High | 0.25 | 0.11 | [-0.04, 0.54]
Low | -0.18 | 0.11 | [-0.47, 0.11]
Worst | -1.04 | 0.11 | [-1.33, -0.75]
Marginal means estimated at behavior
For example, is Best meaningfully different from High?
This approach assumes that you need to make no adjustment for the fact that you are doing multiple comparisons, simultaneously.
Marginal Contrasts Analysis
Level1 | Level2 | Difference | 99% CI | SE | t(84) | p
-------------------------------------------------------------------
Best | High | 0.71 | [0.30, 1.12] | 0.16 | 4.55 | < .001
Best | Low | 1.14 | [0.73, 1.55] | 0.16 | 7.32 | < .001
Best | Worst | 2.01 | [1.59, 2.42] | 0.16 | 12.85 | < .001
High | Low | 0.43 | [0.02, 0.84] | 0.16 | 2.76 | 0.007
High | Worst | 1.29 | [0.88, 1.71] | 0.16 | 8.29 | < .001
Low | Worst | 0.86 | [0.45, 1.27] | 0.16 | 5.53 | < .001
Marginal contrasts estimated at behavior
p-values are uncorrected.
In the worst case scenario, suppose you do two tests - first A vs. B and then A vs. C, each at the 99% confidence level.
Run the first test. Make a Type I error 1% of the time.
| A vs B Type I error | Probability |
|---|---|
| Yes | 0.01 |
| No | 0.99 |
Now, run the second test. Assume (perhaps wrongly) that comparing A to C is independent of your A-B test result. What is the error rate now?
Assuming there is a 1% chance of making an error in either test, independently …
| – | Error in A vs. C | No Error | Total |
|---|---|---|---|
| Type I error in A vs. B | 0.0001 | 0.0099 | 0.01 |
| No Type I error in A-B | 0.0099 | 0.9801 | 0.99 |
| Total | 0.01 | 0.99 | 1.00 |
So you will make an error in the A-B or A-C comparison 1.99% of the time, rather than the nominal \(\alpha = 0.01\) error rate.
and if they were independent, and each done at a 5% error rate, we could still wind up with an error rate of
\(.05 + (.95)(.05) + (.95)(.95)(.05) + (.95)^3(.05) + (.95)^4(.05) + (.95)^5(.05)\) = .265
Or worse, if they’re not independent.
If we do 6 tests, we could reduce the necessary \(\alpha\) to 0.01 / 6 = 0.00167 and that maintains an error rate no higher than \(\alpha = 0.01\) across the 6 tests.
Suppose you have \(m\) comparisons, with p-values sorted from low to high as \(p_1\), \(p_2\), …, \(p_m\).
This is uniformly more powerful than Bonferroni, while preserving the overall false positive rate at \(\alpha\). It’s also the default choice of estimate_contrasts().
Marginal Contrasts Analysis
Level1 | Level2 | Difference | 99% CI | SE | t(84) | p
--------------------------------------------------------------------
Best | High | 0.71 | [ 0.20, 1.22] | 0.16 | 4.55 | < .001
Best | Low | 1.14 | [ 0.64, 1.65] | 0.16 | 7.32 | < .001
Best | Worst | 2.01 | [ 1.50, 2.51] | 0.16 | 12.85 | < .001
High | Low | 0.43 | [-0.08, 0.94] | 0.16 | 2.76 | 0.007
High | Worst | 1.29 | [ 0.79, 1.80] | 0.16 | 8.29 | < .001
Low | Worst | 0.86 | [ 0.36, 1.37] | 0.16 | 5.53 | < .001
Marginal contrasts estimated at behavior
p-value adjustment method: Holm (1979)
con_holm <- estimate_contrasts(model_one, ci = 0.99, p_adjust = "holm")
con_holm_tib <- tibble(con_holm) |>
mutate(contr = str_c(Level1, " - ", Level2))
con_holm_tib |> head()# A tibble: 6 × 10
Level1 Level2 Difference CI_low CI_high SE df t p contr
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 Best High 0.711 0.204 1.22 0.156 84 4.55 3.52e- 5 Best - Hi…
2 Best Low 1.14 0.635 1.65 0.156 84 7.32 5.53e-10 Best - Low
3 Best Worst 2.01 1.50 2.51 0.156 84 12.8 9.54e-21 Best - Wo…
4 High Low 0.431 -0.0759 0.939 0.156 84 2.76 7.04e- 3 High - Low
5 High Worst 1.29 0.787 1.80 0.156 84 8.29 7.87e-12 High - Wo…
6 Low Worst 0.863 0.356 1.37 0.156 84 5.53 1.07e- 6 Low - Wor…
ggplot(con_holm_tib, aes(x = contr, y = Difference)) +
geom_point(size = 3, col = "purple") +
geom_errorbar(aes(ymin = CI_low, ymax = CI_high)) +
geom_hline(yintercept = 0, col = "red", lty = "dashed") +
labs(title = "Holm 99% Intervals for Health Outcomes",
x = "Contrast",
y = "Difference in Health Outcomes Score")Tukey’s HSD approach is a better choice for pre-planned comparisons with a balanced (or nearly balanced) design.
Marginal Contrasts Analysis
Level1 | Level2 | Difference | 99% CI | SE | t(84) | p
--------------------------------------------------------------------
Best | High | 0.71 | [ 0.21, 1.21] | 0.16 | 4.55 | < .001
Best | Low | 1.14 | [ 0.64, 1.64] | 0.16 | 7.32 | < .001
Best | Worst | 2.01 | [ 1.50, 2.51] | 0.16 | 12.85 | < .001
High | Low | 0.43 | [-0.07, 0.93] | 0.16 | 2.76 | 0.035
High | Worst | 1.29 | [ 0.79, 1.80] | 0.16 | 8.29 | < .001
Low | Worst | 0.86 | [ 0.36, 1.36] | 0.16 | 5.53 | < .001
Marginal contrasts estimated at behavior
p-value adjustment method: Tukey
The assumptions behind analysis of variance are those of a linear model. Of specific interest are:
Happily, the ANOVA F test is fairly robust to violations of the Normality assumption.
We check assumptions of linear models like ANOVA with residual plots, for instance with a Normal Q-Q plot.
Would a transformation have been helpful in this setting?
Our model was: lm(outcomes ~ behavior, data = ohio20)
# A tibble: 4 × 2
behavior `min(outcomes)`
<fct> <dbl>
1 Best -0.332
2 High -0.350
3 Low -1.15
4 Worst -2.05
outcomes) to be strictly positive.Yes, but this isn’t exciting if we have a balanced design.
One-way analysis of means (not assuming equal variances)
data: outcomes and behavior
F = 43.145, num df = 3.000, denom df = 45.494, p-value = 2.349e-13
If you thought the data were severely skewed, you might try:
Kruskal-Wallis rank sum test
data: outcomes by behavior
Kruskal-Wallis chi-squared = 61.596, df = 3, p-value = 2.681e-13
behavior groups have the same center to their outcomes distributions.outcomes.What would be the conclusion here?
set.seed(20240919)
model_1b <- stan_glm(outcomes ~ behavior, data = ohio20, refresh = 0)
model_parameters(model_1b, ci = 0.99)Parameter | Median | 99% CI | pd | Rhat | ESS | Prior
--------------------------------------------------------------------------------------------
(Intercept) | 0.97 | [ 0.67, 1.25] | 100% | 1.001 | 1781.00 | Normal (4.09e-11 +- 2.23)
behaviorHigh | -0.71 | [-1.11, -0.29] | 100% | 1.000 | 1858.00 | Normal (0.00 +- 5.11)
behaviorLow | -1.14 | [-1.55, -0.74] | 100% | 1.000 | 2230.00 | Normal (0.00 +- 5.11)
behaviorWorst | -2.00 | [-2.42, -1.59] | 100% | 1.001 | 2090.00 | Normal (0.00 +- 5.11)
Estimated Marginal Means
behavior | Mean | 99% CI | pd
------------------------------------------
Best | 0.97 | [ 0.67, 1.25] | 100%
High | 0.25 | [-0.04, 0.54] | 98.83%
Low | -0.18 | [-0.47, 0.13] | 94.00%
Worst | -1.04 | [-1.34, -0.76] | 100%
Marginal means estimated at behavior
Marginal Contrasts Analysis
Level1 | Level2 | Difference | 99% CI | pd | % in ROPE
----------------------------------------------------------------
Best | High | 0.71 | [0.29, 1.11] | 100% | 0%
Best | Low | 1.14 | [0.74, 1.55] | 100% | 0%
Best | Worst | 2.00 | [1.59, 2.42] | 100% | 0%
High | Low | 0.43 | [0.01, 0.85] | 99.52% | 0%
High | Worst | 1.29 | [0.90, 1.71] | 100% | 0%
Low | Worst | 0.86 | [0.44, 1.31] | 100% | 0%
Marginal contrasts estimated at behavior
con_ols <- as_tibble(estimate_contrasts(model_one, ci = 0.99,
p_adjust = "Holm")) |>
mutate(contr = str_c(Level1, " - ", Level2))
con_bay <- as_tibble(estimate_contrasts(model_1b, ci = 0.99,
p_adjust = "Holm")) |>
mutate(contr = str_c(Level1, " - ", Level2))
p1 <- ggplot(con_ols, aes(y = contr, x = Difference)) +
geom_point(size = 3, color = "firebrick") +
geom_errorbar(aes(xmin = CI_low, xmax = CI_high)) +
geom_vline(xintercept = 0, col = "red", lty = "dashed") +
labs(title = "OLS fit",
subtitle = "Holm 99% HSD Intervals for Health Outcomes",
y = "Contrast", x = "Difference in Health Outcomes Score")
p2 <- ggplot(con_bay, aes(y = contr, x = Difference)) +
geom_point(size = 3, color = "navy") +
geom_errorbar(aes(xmin = CI_low, xmax = CI_high)) +
geom_vline(xintercept = 0, col = "red", lty = "dashed") +
labs(title = "Bayesian fit",
subtitle = "Holm 99% HSD Intervals for Health Outcomes",
y = "Contrast", x = "Difference in Health Outcomes Score")
p1 + p2R version 4.4.1 (2024-06-14 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 11 x64 (build 22631)
Locale:
LC_COLLATE=English_United States.utf8
LC_CTYPE=English_United States.utf8
LC_MONETARY=English_United States.utf8
LC_NUMERIC=C
LC_TIME=English_United States.utf8
Package version:
abind_1.4-8 askpass_1.2.0 backports_1.5.0
base64enc_0.1-3 bayesplot_1.11.1 bayestestR_0.14.0
BH_1.84.0.0 bit_4.0.5 bit64_4.0.5
blob_1.2.4 boot_1.3-31 broom_1.0.6
bslib_0.8.0 cachem_1.1.0 callr_3.7.6
car_3.1-2 carData_3.0-5 cellranger_1.1.0
checkmate_2.3.2 cli_3.6.3 clipr_0.8.0
cluster_2.1.6 coda_0.19-4.1 codetools_0.2-20
colorspace_2.1-1 colourpicker_1.3.0 commonmark_1.9.1
compiler_4.4.1 conflicted_1.2.0 correlation_0.8.5
cowplot_1.1.3 cpp11_0.5.0 crayon_1.5.3
crosstalk_1.2.1 curl_5.2.2 data.table_1.16.0
datasets_4.4.1 datawizard_0.12.3 DBI_1.2.3
dbplyr_2.5.0 Deriv_4.1.6 desc_1.4.3
digest_0.6.37 distributional_0.5.0 doBy_4.6.22
dplyr_1.1.4 DT_0.33 dtplyr_1.3.1
dygraphs_1.1.1.6 easystats_0.7.3 effectsize_0.8.9
emmeans_1.10.4 estimability_1.5.1 evaluate_1.0.0
fansi_1.0.6 farver_2.1.2 fastmap_1.2.0
fontawesome_0.5.2 forcats_1.0.0 foreign_0.8-87
Formula_1.2-5 fs_1.6.4 gargle_1.5.2
generics_0.1.3 ggplot2_3.5.1 ggrepel_0.9.6
ggridges_0.5.6 glue_1.7.0 googledrive_2.1.1
googlesheets4_1.1.1 graphics_4.4.1 grDevices_4.4.1
grid_4.4.1 gridExtra_2.3 gtable_0.3.5
gtools_3.9.5 haven_2.5.4 highr_0.11
Hmisc_5.1-3 hms_1.1.3 htmlTable_2.4.3
htmltools_0.5.8.1 htmlwidgets_1.6.4 httpuv_1.6.15
httr_1.4.7 ids_1.0.1 igraph_2.0.3
inline_0.3.19 insight_0.20.4 isoband_0.2.7
janitor_2.2.0 jquerylib_0.1.4 jsonlite_1.8.9
knitr_1.48 labeling_0.4.3 later_1.3.2
lattice_0.22-6 lazyeval_0.2.2 lifecycle_1.0.4
lme4_1.1-35.5 loo_2.8.0 lubridate_1.9.3
magrittr_2.0.3 markdown_1.13 MASS_7.3-61
Matrix_1.7-0 MatrixModels_0.5.3 matrixStats_1.4.1
memoise_2.0.1 methods_4.4.1 mgcv_1.9-1
microbenchmark_1.5.0 mime_0.12 miniUI_0.1.1.1
minqa_1.2.8 modelbased_0.8.8 modelr_0.1.11
multcomp_1.4-26 munsell_0.5.1 mvtnorm_1.3-1
nlme_3.1-164 nloptr_2.1.1 nnet_7.3-19
numDeriv_2016.8.1.1 openssl_2.2.1 parallel_4.4.1
parameters_0.22.2 patchwork_1.3.0 pbkrtest_0.5.3
performance_0.12.3 pillar_1.9.0 pkgbuild_1.4.4
pkgconfig_2.0.3 plyr_1.8.9 posterior_1.6.0
prettyunits_1.2.0 processx_3.8.4 progress_1.2.3
promises_1.3.0 ps_1.8.0 purrr_1.0.2
quantreg_5.98 QuickJSR_1.3.1 R6_2.5.1
ragg_1.3.2 rappdirs_0.3.3 RColorBrewer_1.1.3
Rcpp_1.0.13 RcppEigen_0.3.4.0.2 RcppParallel_5.1.9
readr_2.1.5 readxl_1.4.3 rematch_2.0.0
rematch2_2.1.2 report_0.5.9 reprex_2.1.1
reshape2_1.4.4 rlang_1.1.4 rmarkdown_2.28
rpart_4.1.23 rstan_2.32.6 rstanarm_2.32.1
rstantools_2.4.0 rstudioapi_0.16.0 rvest_1.0.4
sandwich_3.1-1 sass_0.4.9 scales_1.3.0
see_0.9.0 selectr_0.4.2 shiny_1.9.1
shinyjs_2.1.0 shinystan_2.6.0 shinythemes_1.2.0
snakecase_0.11.1 sourcetools_0.1.7.1 SparseM_1.84.2
splines_4.4.1 StanHeaders_2.32.10 stats_4.4.1
stats4_4.4.1 stringi_1.8.4 stringr_1.5.1
survival_3.7-0 sys_3.4.2 systemfonts_1.1.0
tensorA_0.36.2.1 textshaping_0.4.0 TH.data_1.1-2
threejs_0.3.3 tibble_3.2.1 tidyr_1.3.1
tidyselect_1.2.1 tidyverse_2.0.0 timechange_0.3.0
tinytex_0.53 tools_4.4.1 tzdb_0.4.0
utf8_1.2.4 utils_4.4.1 uuid_1.2.1
V8_5.0.1 vctrs_0.6.5 viridis_0.6.5
viridisLite_0.4.2 vroom_1.6.5 withr_3.0.1
xfun_0.47 xml2_1.3.6 xtable_1.8-4
xts_0.14.0 yaml_2.3.10 zoo_1.8-12
431 Class 08 | 2024-09-19 | https://thomaselove.github.io/431-2024/